Background: Emerging evidence suggests that viral infections play a crucial role in the pathogenesis of certain leukemias, with viral integration and its interaction with the human genome potentially contributing to disease onset and progression. Since Astolfi et al. identified the first case of TTMV::RARA-APL in a 6-year-old child, more than ten cases have been identified in the past two years. However, no other viruses have been reported to form fusion genes with human genes in leukemia. Our study leverages a vast repository of poly(A) RNA-seq data from AML patients to develop a sensitive bioinformatics workflow. This workflow aims to uncover the molecular footprints left by viruses in the leukemic transcriptome and discover novel viral-human fusion events that may have been previously overlooked.

Methods: The cDNA library construction and sequencing were performed as previously described (Chen, Blood Cancer J 2021). Raw RNA sequencing data were initially processed using Fastp for adapter trimming, polyA/T trimming, and low-quality reads filtering. The high-quality reads were then mapped to the human reference genome (GRCH38) using the STAR aligner (version 2.7.10b). Samtools was used to sort the aligned BAM files by chromosome position and extract reads that were properly paired. A coustom Perl scripts was written to extract the reads without proper paired alignment. SPAdes was used to assemble the unmapped reads into scaffolds, which were then annotated by Blast against the NCBI Refseq Virus database, using an expected value (E-value) cutoff of less than 1e-10 and an alignment percent more than 80%. Due to the limited data available for TTV and TTMV in the NCBI RefSeq Virus database, we enhanced the accuracy of our annotations by incorporating sequences of TTV and TTMV from the NCBI Nucleotide database into our reference database. Reads with soft clipping were identified from the BAM alignment and grouped by the genome positions where soft clipping occurred. Cap3 was then used to assemble these soft clipping reads into consensus sequences (contigs) for each group. The soft-clipped contigs were aligned to the unmapped scaffolds using Blast to reconstruct the fusion transcripts.

Results: A cohort of 712 AML cases was enrolled, including 154 children and 556 adults (divided by ≥18 years), with an age range of 0-89 years (median 42 years), and a male-to-female ratio of 375:337. Virus sequences were identified in 523 cases (73.46%), with an average of 4.25 viruses in one sample. Duplodnaviria was found in 414 cases (58.15%), Riboviria in 220 cases (30.90%), Varidnaviria in 38 (5.34%), Monodnaviria in 14 (1.97%) and other virus in 68 cases (9.55%). Alphatorquevirus (Torque Teno Viruses, TTVs) and betatorquevirus (Torque Teno mini Viruses, TTMVs) compose mostly of the Anelloviridae fraction, with TTV was found in 101 cases (14.19%) and TTMV in 30 cases (4.21%). Human herpesvirus 1 (HHV-1) was identified in 297 cases (41.71%), Cytomegalovirus (CMV) in 239 cases (33.57%), Epstein-Barr virus (EBV) in 25 cases (3.51%), and Human adenovirus C (HAdV-C) in 24 cases (3.37%), these were the most frequently identified pathogens in this cohort. Using this analytical approach, we detected one case with the TTMV::RARA fusion in this cohort. We re-analyzed the poly(A) transcriptome sequencing data from several cases of TTMV::RARA previously reported by our center using this analytical process. The re-analysis confirmed that the TTMV::RARA fusion could be successfully identified using this protocol. No other virus-human coding gene fusion was found in this cohort, underscoring the rarity of virus-human fusion genes.

Conclusions: We present an analytical workflow for the detection of viral gene transcripts and viral-human gene fusions, providing a reference for future studies of tumor-associated viruses using poly(A) transcriptome sequencing data. This workflow facilitates the discovery of viral integration events and the identification of expressed viral transcripts in various diseases.

Disclosures

No relevant conflicts of interest to declare.

This content is only available as a PDF.
Sign in via your Institution